Skip to content

Conversation

@bw4sz
Copy link
Collaborator

@bw4sz bw4sz commented Dec 30, 2025

  • This PR favors trainer.validate over main.evaluate since it closely matches pytorch lightning. I considered renaming main.evaluate to main.evaluate or deprecating it to hide it from users, I welcome input on that.
  • We want one standard evaluation path and then assert the behavior of this matches the inference path. I started on this road by adding tests to check parity between main.evaluate and trainer.validate.
  • I also simplified the outputs of trainer.validate since if this is going to be primary evaluation vehicle, it has too many map products for the average user.
  • I removed the size and batch size argument from main.evaluate and had it routed through the config, in general we want to config to control as much as possible Image resizing and performance during training and validation #1240 as we discussed here.
  • I simplified the prediction dataset workflow to just accept lists instead of tensors. This is faster, cleaner and easier to read. There is one additional complication for the MultiImage dataset since the batch comes from several images and we need to keep track of which in order to put it all back together.
  • I wrote several tests to assert constancy of actions between evaluate and predict making the whole system more cohesive.

AI-Assisted Development

  • [x ] I used AI tools (e.g., GitHub Copilot, ChatGPT, etc.) in developing this PR
  • [ x] I understand all the code I'm submitting
  • [ x] I have reviewed and validated all AI-generated code

AI tools used (if applicable):

I used cursor planning mode to help structure the tests, which i then edited and simplified.


Note

Major refactor to streamline augmentation, evaluation, prediction, and I/O with extensive tests.

  • Migrate augmentations to kornia (remove albumentations), add ZoomBlur and RandomPadTo, and switch dataset transforms to AugmentationSequential
  • Standardize evaluation via trainer.validate with simplified mAP logging; deprecate main.evaluate (internal __evaluate__ retained)
  • Simplify prediction datasets: use list-based batches, track sub-batch/window indices for MultiImage and tiled rasters, and update predict/predict_tile postprocessing
  • Overhaul utilities.read_file and geospatial handling with DeepForest_DataFrame, explicit image_path/label/root_dir assignment, and improved COCO/shape conversions
  • Update visualization to infer dimensions from image/root_dir (remove explicit width/height), and harden callbacks for empty annotations
  • Config/schema: add log_root; crop model: add bbox expand and dataset support
  • Adjust datasets (training, prediction, cropmodel) for new transforms, normalization, and box filtering; ensure 3-channel checks
  • Update docs (user guide, HISTORY), README link fix, and add kornia dependency; tweak codecov.yml to mark patch status informational
  • Add comprehensive tests for augmentations, datasets, callbacks, CLI, evaluation parity, crop model, DETR, and prediction batching

Written by Cursor Bugbot for commit 71728ea. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@bw4sz bw4sz added this to the DeepForest 2.1 milestone Jan 7, 2026
if len(self.sublist_lengths) > 0:
batch_sublist_lengths = self.sublist_lengths[batch_idx]
for idx, sub_idx in batch_sublist_lengths:
result = self.format_batch(batch[sub_idx], idx, sub_idx)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong predictions accessed due to reset sub_idx in flattened batch

High Severity

In MultiImage.postprocess, batch[sub_idx] indexes into the flattened prediction results using sub_idx, which resets to 0 for each image in the batch (as stored in sublist_lengths). This causes predictions for the second and subsequent images to incorrectly retrieve prediction results from the first image. For example, with 2 images having 4 patches each, when processing the second image (idx=1), sub_idx values 0,1,2,3 cause batch[0:3] to be accessed instead of batch[4:7]. A running index into the flattened batch is needed instead.

Additional Locations (1)

Fix in Cursor Fix in Web

@bw4sz bw4sz changed the title [WIP] Simplify evaluation and create tests to assert expected eval behavior Simplify evaluation and prediction to mirror training datasets and create tests to assert expected eval behavior Jan 12, 2026
@bw4sz bw4sz self-assigned this Jan 12, 2026
Copy link
Collaborator

@jveitchmichaelis jveitchmichaelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few comments for readability + needs rebase, but looks good. I using Lightning-y methods is the right direction to avoid confusion. Once this is in, I'll rebase #1256 which should make this even tidier.

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jan 16, 2026

@jveitchmichaelis I can rebase if you are ready.

@bw4sz bw4sz force-pushed the simplify_evaluate branch 6 times, most recently from 71728ea to ffcd713 Compare January 16, 2026 18:24
@bw4sz
Copy link
Collaborator Author

bw4sz commented Jan 16, 2026

@jveitchmichaelis , @ethanwhite and I just discussed this. In this case, when you are ready, git squash and merge and we will just eat these history changes. I have learned my lesson about how to do this correctly in the future.

Copy link
Member

@ethanwhite ethanwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging to stop merge while I finish understanding the history complexities here

- Make trainer.validate the preferred evaluation method and standardize train,
eval and predict to accept lists not batches.
- Add a sublist concept for MultiImage datasets
- Ensure predict_file follows the root_dir criteria
for read_file
paths (List[str]): A list of image paths.
patch_size (int): Size of the patches to extract.
patch_overlap (float): Overlap between patches.
size (int): Target size to resize images to. Optional; if not provided, no resizing is performed.
Copy link
Collaborator

@jveitchmichaelis jveitchmichaelis Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused arg in docstring L30 + reference at L21

@jveitchmichaelis
Copy link
Collaborator

@bw4sz re-reading this in light of our discussion on Thursday to make sure I'm clear. Is the idea that image size for prediction is solely set via patch_size + overlap?

@ethanwhite if you're happy, could you approve changes to lift the merge block, please? Then I'll merge.

Copy link
Collaborator

@jveitchmichaelis jveitchmichaelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We should probably aim to improve our documentation on how sizes flow through the model as well, but not critical right now.

@bw4sz
Copy link
Collaborator Author

bw4sz commented Jan 19, 2026

@bw4sz re-reading this in light of our discussion on Thursday to make sure I'm clear. Is the idea that image size for prediction is solely set via patch_size + overlap?

@ethanwhite if you're happy, could you approve changes to lift the merge block, please? Then I'll merge.

Yes, that's right, besides the internal retinanet resizing. That still applies for that model.

@jveitchmichaelis jveitchmichaelis merged commit 7010aeb into main Jan 20, 2026
9 checks passed
@jveitchmichaelis jveitchmichaelis deleted the simplify_evaluate branch January 20, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants